摘要 :
Data stores and cloud services are typically accessed using a client-server paradigm wherein the client runs as part of an application process which is trying to access the data store or cloud service. This paper presents the desi...
展开
Data stores and cloud services are typically accessed using a client-server paradigm wherein the client runs as part of an application process which is trying to access the data store or cloud service. This paper presents the design and implementation of enhanced clients for improving both the functionality and performance of applications accessing data stores or cloud services. Our enhanced clients can improve performance via multiple types of caches, encrypt data for providing confidentiality before sending information to a server, and compress data for reducing the size of data transfers. Our clients can perform data analysis to allow applications to more effectively use cloud services. They also provide both synchronous and asynchronous interfaces. An asynchronous interface allows an application program to access a data store or cloud service and continue execution before receiving a response which can significantly improve performance. We present a Universal Data Store Manager (UDSM) which allows an application to access multiple different data stores and provides a common interface to each data store. The UDSM also can monitor the performance of different data stores. A workload generator allows users to easily determine and compare the performance of different data stores. We also present NLU-SA, an application for performing natural language understanding and sentiment analysis on text documents. NLU-SA is implemented on top of our enhanced clients and integrates text analysis with Web searching. We present results from NLU-SA on sentiment on the Web towards major companies and countries. We also present a performance analysis of our enhanced clients.
收起
摘要 :
The persistent growth of big data applications has being raising new challenges in managing large volumes of datasets with high scalability, confidentiality protection, and flexible types of search queries. In this paper, we propo...
展开
The persistent growth of big data applications has being raising new challenges in managing large volumes of datasets with high scalability, confidentiality protection, and flexible types of search queries. In this paper, we propose a secure design to disassemble the private dataset with the aim to store them across geographically distributed servers while supporting secure multi-client Boolean queries. In this design, the data owner encrypts the private database with the searchable index attributes. The encrypted dataset will be disassembled and distributed evenly across multiple servers by leveraging the property of a distributed index framework. By constructing an encryption structure, generating search tokens, and enabling parallel query, we show how the proposed design performs the secure while efficient Boolean search. These queries are not only limited to those initiated by the data owner but also can be extended to support multiple authorized clients, where each client is allowed to access a necessary part of the private database. In this stage, we advocate a non-interactive authorization scheme where data owner is not required to stay online to process the query request. Moreover, the query operation can be executed in parallel, which significantly improves the search efficiency. We formally characterize the leakage profile, which allow us to follow the existing security analysis method to demonstrate that our system can guarantee data confidentiality and query privacy. To validate our protocol, we implement a system prototype and evaluate the efficiency of our construction. Through experimental results, we demonstrate the effectiveness of our protocol in terms of data outsourcing time and Boolean query time.
收起
摘要 :
Computers can now store all forms of information: records, documents, images, sound recordings, videos, scientific data, and many new data formats. Society has made great strides in capturing, storing, managing, analyzing, and vis...
展开
Computers can now store all forms of information: records, documents, images, sound recordings, videos, scientific data, and many new data formats. Society has made great strides in capturing, storing, managing, analyzing, and visualizing this data. These tasks are generically called data management. This article sketches the evolution of data management systems. There have been six distinct phases in data management. Initially, data was manually processed. The next step used punched-card equipment and electromechanical machines to sort and tabulate millions of records. The third phase stored data on magnetic tape and used stored-program computers to perform batch processing on sequential files. The fourth phase introduced the concept of a database schema and on-line navigational access to the data. The fifth step automated access to relational databases and added distributed and client server processing. We are now in the early stages of sixth-generation systems that store richer data types, notably documents, images, voice, and video data. These sixth-generation systems are the storage engines for the emerging Internet and intranets. Early data management systems automated traditional information processing. Today they allow fast, reliable, and secure access to globally distributed data. Tomorrow's systems will access and summarize richer forms of data. It is argued that multimedia databases will be a cornerstone of cyberspace.
收起